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Background € 
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» Started with a Weston patch proposal 

e Many strong views 

e Much time invested in current software and APIs 
* Thank you for keeping discussions civil 


* Many areas for improvement identified 


Problem Space 


» Device-accessible Surface Allocation in Userspace 
* Surface Handles 
۰ Surface State/Layout Management 


e Synchronization 
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Goals ex 
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» Consensus-based, forward-looking APIs 
e Window System, Kernel, and Vendor Agnostic 
e Minimal, Optimal driver interface 


e Final destination: Optimized scene graphs for every frame 


Prior Art: GBM a 


Provides: Allocation, Arbitration, Handles 
* Benefits: 
» Incorporated in many codebases now 
Widely deployed and well exercised 
Minimal API & implementation 
Allocation-time usage specification for supported usages 


* Current Shortcomings: 
Process-local handles only. Can import external handles, but not export 
e Currently very GPU-focused 
Arbitration is within device scope 


Prior Art: Chrome OS/Freon A 
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® Attempted to add surface state management to GBM/EGLImage 
e Failed to reach consensus optimal design 


e Major point of contention: Level of abstraction. 


Prior Art: Gralloc 


e Provides: Allocation, Arbitration, Handles 
e Synchronization via Android/Linux fence FDs 
e Out-of-process handles require other components 


e Benefits: 
e Deployed, proven in field 
e Allocation-time usage specification 
e Support for non-graphics usage 
e Current Shortcomings: 
e No explicit surface state management 
e Limited, usage-flag-based arbitration abilities 
* Open Source, but proprietary API 
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Prior Art: EGLStream 


e Provides: Allocation, Arbitration, Handles, State Management, 


Synchronization 
* Benefits: 


p" 
E 
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Deployed, proven in field 
Portable 
Comprehensive feature set and extensible 


* Current Shortcomings: 


Open standard, but single vendor implementation in practice 
No cross-device support 

It is EGL-based 

Too much encapsulation 

Behavior loosely defined or undefined in some cases 
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Prior Art: DMA-BUF 


Provides: Handles 


* Benefits: 
Supported by non-graphics devices 


e Current Shortcomings: 
No centralized userspace allocation API 
e Linux-only 
Does not describe content layout 
No arbitration 
Limited or no allocation-time usage specification 
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Prior Art: Vulkan 


PS 


Synchronization 


e Benefits: 
e Allocation-time usage specification for graphics/compute 

Image state management 

e Extensible 


e Portable 
* Current Shortcomings: 


E 
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e No Unix cross-process/cross-APl/cross-device handles or arbitration 


e Graphics/Compute and Display only 


e Provides: Allocation, Detailed Usage, State Management, 
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Important features identified a 


e Minimalism 

* Portability 

® Support for non-graphics devices 

* Optimal performance in steady state 

» Allocation-time usage specification 

e Driver-negotiated image capabilities 

* Good performance during usage transitions 

e Multiple usages per image without reallocation 
* Image layout transitions 


Path Forward e 


Suggest a focus on solving problems, rather than picking a 
winner from existing APIs 


Focus on cross-driver, cross-engine, cross-device image/texture 
arbitration first 

This has historically been where everything falls apart 

Simpler cases fall out naturally from this 

State transitions are also easier with well-described end points 


Also, Jason Ekstrand has put together some proposals for this 
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Assumptions 


For the sake of simplifying initial discussions: 
1. Assume we are designing an ideal allocation API from scratch 
2. Think in terms of userspace API first 


3. Both API elegance and hardware capabilities are important 


Image Sharing Proposals a 


Define extensible capability descriptor lists 


Similar concept to Khronos data-format spec, but describing properties 
other than sub-pixel data layout and interpretation 


Lists of capabilities could be queried from each “driver” 
List could be large. Some filtering mechanism would be employed 


Centralized mechanism mutexes the capability namespaces 
Could be a file in a git repo, Khronos, etc. Anything authoritative 


Image creation function intersects capabilities of relevant drivers 


Proposal: How are capabilities filtered? Sa 


Describe the desired usage 
Examples of usage: Format, operations, dimensions 


Leads to next question: How is usage described? 
e Make use of Khronos data format spec for formats 
Some usage data, such as vvidth/height have obvious representations 
Other data lend themselves to boolean flags, like those in Gralloc 
Some usage is specific to certain devices or engines 
e Each driver ignores usages targeted only at other drivers 
Special device/engine target for basic usage properties: ALL 
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Proposal: How are capabilities intersected? 


First pass: Each driver eliminates incompatible capabilities 
Unrecognized or vendor-specific capabilities are inherently incompatible 
E.g., Intel driver would trivially eliminate all NVIDIA tiling formats 


Second pass: Sort the remaining capabilities 
Correct sorting is implementation and usage dependent 
Therefore, must be done by a driver, not common framework 
Which driver? Straw-man proposal: Let the app decide. 


Proposal: Describing allocation result Sa 


e After an image is created, its chosen properties must be 
described 


$ Can chosen capability data double as property definitions? 


Image Capabilities Vs. Memory Capabilities ex 
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* Thus far, focused on image-level capabilities 


* What about memory level capabilities? 
® e.g., contiguous requirement 


e Image capability mechanism should generalize to describe these 


e Might be a separate but symmetric step in allocation machine 


Questions? es 
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Backup Slides Sa 


Backup Slides 
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#define VENDOR BASE 0x0000 
// Remaining Vendor Namespace: 0x0001-0xFFFF 


lef header 1 
uint16 t vendor; 
uintl16 t property name; 


uint32 t length in words; 


F3 


header capability header t; 
header usage header t; 


Code: Capabilities 


#define CAP BASE PITCH LINEAR 0x0000 // upstream-controlled namespace 
typedef struct capability pitch linear 1 
capability header t header; // + VENDOR BASE, CAP BASE PITCH LINEAR, 1 ۶ 
uint32 t min stride in bytes; 
capability pitch linear t; 


define CAP NVIDIA TILED 0x0000 // NV-specific namespace 

typedef struct capability nvidia tiled 4 
capability header t header; // 1 VENDOR BASE, CAP NVIDIA TILED, 1 + 
uint16 t tile width; 
uintl16 t tile height; 

} capability nvidia tile format t; 


#define CAP NVIDIA COMPRESSED 0x0001 // NV-specific namespace 

typedef struct capability nvidia compressed 4 
capability header t header; // 4 VENDOR BASE, CAP NVIDIA COMPRESSED, 1 } 
uint32 t compressed; 

} capability nvidia compressed t; 


Code: Usage 


#define USAGE BASE TEXTURE 0x0000 // upstream-controlled namespace 
typedef struct usage texture { 

usage header t header; // 1 VENDOR BASE, USAGE BASE TEXTURE, 0 + 
) usage texture t; 


#define USAGE BASE DISPLAY 0x0001 // upstream-controlled namespace 
typedef struct usage display + 

usage header t header; // 1 VENDOR BASE, USAGE BASE DISPLAY, 0 ۶ 
) usage display t; 


define USAGE NVIDIA DISPLAY 0x0000 // NV-specific namespace 

typedef struct usage nvidia display + 
usage header t header; // 1 VENDOR NVIDIA, USAGE NVIDIA DISPLAY, 1 ۶ 
uint32 t rotation; 

) usage nvidia display t; 


Code: App-supplied usage lists A 
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* device t; 
typdef usage + 


device t dev; 
usage header t usage; 
j usage t; 
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idx surface t; 


// Application-facing 
AllocSurface(device t primary device, 
uint32 t width, 


uin32 t Nt, 
ns: dx khr data format, 
uint32. t usage list length, 
st usage t 06 List 
surface tx surface out); 


Code: Driver-side Usage 


typedef struct driver api { 
void (xget capabilities)(device t dev, 
uint32 t width, uint32 t height, const uint32 tx Rhr data format, 
uint32 t usage list length, 
const usate tx usage list, 
uint32 tx capability list length out, 
capability header _txx capability list out); 


const capability header tx («filter capabilities)(device t dev, 
uint32 t width, uint32 t height, const uint32 tx khr data format, 
uint32 t usage list Length, 
const usate tx usage list, 
uint32 t capability list length in, 
const capability header tx capability list in, 
uint32 tx capability list length out, 
capability header _txx capability list out); 


surface t (xalloc surface) (device t dev, 
uint32 t width, uint32 t height, uint32 tx khr data format, 
uint32 t usage list length, 
. usate tx usage List, 


uint32 t capability list length, 
capability header tx capability list); 
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